Search CORE

35 research outputs found

A Modeling Approach based on UML/MARTE for GPU Architecture

Author: Dekeyser Jean-Luc
Guyomarc'H Frédéric
Rodrigues Antonio Wendell De Oliveira
Publication venue
Publication date: 10/05/2011
Field of study

Nowadays, the High Performance Computing is part of the context of embedded systems. Graphics Processing Units (GPUs) are more and more used in acceleration of the most part of algorithms and applications. Over the past years, not many efforts have been done to describe abstractions of applications in relation to their target architectures. Thus, when developers need to associate applications and GPUs, for example, they find difficulty and prefer using API for these architectures. This paper presents a metamodel extension for MARTE profile and a model for GPU architectures. The main goal is to specify the task and data allocation in the memory hierarchy of these architectures. The results show that this approach will help to generate code for GPUs based on model transformations using Model Driven Engineering (MDE).Comment: Symposium en Architectures nouvelles de machines (SympA'14) (2011

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

An autoadaptative limited memory Broyden’s method to solve systems of nonlinear equations

Author: Guyomarc'H Frédéric
Ziani Mohammed
Publication venue: 'Elsevier BV'
Publication date: 01/11/2008
Field of study

International audienceWe propose a new Broyden-like method that we call autoadaptative limited memory method. Unlike classical limited memory method, we do not need to set any parameters such as the maximal size, that solver can use. In fact, the autoadaptative algorithm automatically increases the approximate subspace when the convergence rate decreases. The convergence of this algorithm is superlinear under classical hypothesis. A few numerical results with well-known benchmarks functions are also provided and show the efficiency of the method

HAL - Lille 3

HAL-UNICE

INRIA a CCSD electronic archive server

Hal-Diderot

Programming Massively Parallel Architectures using MARTE: a Case Study

Author: De Oliveira Rodrigues Antonio Wendell
Dekeyser Jean-Luc
Guyomarc'H Frédéric
Publication venue: HAL CCSD
Publication date: 14/03/2011
Field of study

International audienceNowadays, several industrial applications are being ported to parallel architectures. These applications take advantage of the potential parallelism provided by multiple core processors. Many-core processors, especially the GPUs(Graphics Processing Unit), have led the race of floating-point performance since 2003. While the performance improvement of general-purpose microprocessors has slowed significantly, the GPUs have continued to improve relentlessly. As of 2009, the ratio between many-core GPUs and multicore CPUs for peak floating-point calculation throughput is about 10 times. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Aiming to improve the use of many-core processors, this work presents an case-study using UML and MARTE profile to specify and generate OpenCL code for intensive signal processing applications. Benchmark results show us the viability of the use of MDE approaches to generate GPU applications

HAL - Lille 3

INRIA a CCSD electronic archive server

Component-based Models Going Generic : the MARTE Case-Study

Author: De Moura Filho César Olavo
Dumoulin Cedric
Etien Anne
Guyomarc'H Frédéric
Taillard Julien
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

One of the reasons for using component-based modeling is to improve on reusability. However, there are cases where a whole component cannot be reused just because one element from its internal structure does not present the required features (e.g., type, multiplicity, etc). In this paper, we propose the use of parameterized components as a way to address this problem - and thus to get a further boost on reusability. The UML specification provides support to parameterization via templates. However, when it comes to component-based modeling, UML is but the first metamodel in sometimes long chains of transformations, comprising other domain metamodels. So, in order to keep parameters deeper down the transformation chains, we introduce generic components in those metamodels. However, instead of changing the target metamodel, we decided to create an independent metamodel with the additional concepts required by parameterization, so it can be attached to any target metamodel. The most obvious advantage of this approach is that we do not have to touch the target metamodel. We also demonstrate how existing transformations can be easily adapted to accept the parameter-related concepts. To illustrate our ideas, we used OMG's MARTE metamodel for real-time and embedded systems. The approach has been validated through transformations written in QVT

HAL - Lille 3

INRIA a CCSD electronic archive server

Automatic Multi-GPU Code Generation applied to Simulation of Electrical Machines

Author: Dekeyser Jean-Luc
Guyomarc'H Frédéric
Menach Yvonnick Le
Rodrigues Antonio Wendell De Oliveira
Publication venue
Publication date: 04/07/2011
Field of study

The electrical and electronic engineering has used parallel programming to solve its large scale complex problems for performance reasons. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Thus, in order to reduce design complexity, we propose an approach to generate code for hybrid architectures (e.g. CPU + GPU) using OpenCL, an open standard for parallel programming of heterogeneous systems. This approach is based on Model Driven Engineering (MDE) and the MARTE profile, standard proposed by Object Management Group (OMG). The aim is to provide resources to non-specialists in parallel programming to implement their applications. Moreover, thanks to model reuse capacity, we can add/change functionalities or the target architecture. Consequently, this approach helps industries to achieve their time-to-market constraints and confirms by experimental tests, performance improvements using multi-GPU environments.Comment: Compumag 201

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Enabling Traceability in an MDE Approach to Improve Performance of GPU Applications

Author: Aranega Vincent
De Oliveira Rodrigues Antonio Wendell
Dekeyser Jean-Luc
Etien Anne
Guyomarc'H Frédéric
Publication venue: HAL CCSD
Publication date: 30/08/2011
Field of study

Graphics Processor Units (GPUs) are known for offering high per- formance and power efficiency for processing algorithms that suit well to their massively parallel architecture. Unfortunately, as parallel programming for this kind of architecture requires a complex distribution of tasks and data, developers find it difficult to implement their applications effectively. Although approaches based on source-to-source and model-to-source transformations have intended to provide a low learning curve for parallel programming and take advantage of architecture features to create optimized applications, the programming re- mains difficult for neophytes. A Model Driven Engineering (MDE) approach for GPU intends to hide the low-level details of GPU programming by automati- cally generating the application from the high-level specifications. However, the application designer should take into account some adjustments in the source code to achieve better performance at runtime. Directly modifying the gen- erated source code goes against the MDE philosophy. Moreover, the designer does not necessarily have the required knowledge to effectively modify the GPU generated code. This work aims at improving performance by returning to the high-level models, specific execution data from a profiling tool enhanced by smart advices from an analysis engine. In order to keep the link between exe- cution and model, the process is based on a traceability mechanism. Once the model is automatically annotated, it can be re-factored by aiming performance on the re-generated code. Hence, this work allows us keeping coherence between model and code without forgetting to harness the power of GPUs. To illustrate and clarify key points of this approach, an experimental example taking place in a transformation chain from UML-MARTE models to OpenCL code is provided.Graphics Processor Units (GPU) sont connus pour offrir de hautes performances et d'efficacité énergétique pour les algorithmes de traitement qui conviennent bien à leur architecture massivement paralléle. Malheureusement, comme la programmation paralléle pour ce type d'architecture exige une distribution complexe des tâches et des données, les développeurs ont des difficultés à mettre en oeuvre leurs applications de manière efficace. Bien que les approches basées sur les transformations source-to-source et model-to-source ont pour but de fournir une basse courbe d'apprentissage pour la programmation paralléle et tirer parti des fonctionnalités de l'architecture pour créer des applications optimisées, la programmation demeure difficile pour les néophytes. Une approche Model Driven Engineering (MDE) pour le GPU a l'intention de cacher les détails de bas niveau de la programmation GPU en générant automatiquement l'application à partir des spécifications de haut niveau. Cependant, le concepteur de l'application devrait tenir compte de certains ajustements dans le code source pour obtenir de meilleures performances à l'exécution. Modifiant directement le code source généré ne fait pas partie de la philosophie MDE. Par ailleurs, le concepteur n'a pas forcément les connaissances requises pour modifier efficacement le code généré par le GPU. Ce travail vise à améliorer la performance en revenant aux modèles de haut niveau, les données d'exécution spécifiques à partir d'un outil de profilage améliorée par des conseils intelligents d'un moteur d'analyse. Afin de maintenir le lien entre l'exécution et le modèle, le processus est basé sur un mécanisme de traçabilité. Une fois le modèle est automatiquement annoté, il peut être repris en visant la performance sur la réutilisation du code généré. Ainsi, ce travail nous permet de garder la cohérence entre le modèle et le code sans oublier d'exploiter la puissance des GPU. Afin d'illustrer et de clarifier les points clés de cette approche, nous fournissons un exemple se déroule dans une chaîne de transformation à partir de modéles UML- MARTE au code OpenCL

HAL - Lille 3

INRIA a CCSD electronic archive server

A Deflated Version of the Conjugate Gradient Algorithm

Author: Erhel Jocelyne
Guyomarc'H Frédéric
Saad Y.
Yeung M.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2000
Field of study

International audienceWe present a deflated version of the conjugate gradient algorithm for solving linear systems. The new algorithm can be useful in cases when a small number of eigenvalues of the iteration matrix are very close to the origin. It can also be useful when solving linear systems with multiple right-hand sides, since the eigenvalue information gathered from solving one linear system can be recycled for solving the next systems and then updated

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

An MDE Approach for Automatic Code Generation from MARTE to OpenCL

Author: De Oliveira Rodrigues Antonio Wendell
Dekeyser Jean-Luc
Guyomarc'H Frédéric
Publication venue: HAL CCSD
Publication date: 04/02/2011
Field of study

Advanced engineering and scientific communities have used parallel programming to solve their large scale complex problems. Achieving high performance is the main advantage for this choice. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Thus, in order to reduce design complexity, we propose an approach to generate code for OpenCL API, an open standard for parallel programming of heterogeneous systems. This approach is based on Model Driven Engineering (MDE) and Modeling and Analysis of Real-Time and Embedded Systems (MARTE) standard proposed by Object Management Group (OMG). The aim is to provide resources to non-specialist in parallel programming to implement their applications. Moreover, concepts like reuse and platform independence are present. Since we have designed an application and execution platform architecture, we can reuse the same project to add more functionalities and/or change the target architecture. Consequently, this approach helps industries to achieve their time-to-market constraints. The resulting code, for the host and compute devices, are compilable source files that satisfy the specifications defined on design time.L'ingénierie avancée et les communautés scientifiques utilisent souvent la programmation parallèle pour résoudre leurs problèmes complexes de grande envergure. Atteindre la haute performance est le principal avantage de ce choix. Toutefois, comme la programmation parallèle nécessite une distribution non-trivial de tâches et de données, les développeurs ont du mal à mettre en œuvre leurs applications de manière efficace. Ainsi, afin de réduire la complexité de conception, nous proposons une approche pour générer du code pour la API OpenCL, un standard ouvert pour la programmation parallèle de systèmes hétérogènes. Cette approche est basée sur Ingénierie Dirigée par les Modèles (IDM) et de Modeling and Analysis of Real-Time and Embedded Systems (MARTE) norme proposée par l'Object Management Group (OMG). L'objectif est de fournir des ressources pour les non-spécialistes de la programmation parallèle pour dévéloper leurs applications. En outre, des concepts tels que la réutilisation et l'indépendance de plateforme sont présents. Ainsi, une fois que nous avons conçu une application et architecture de la plateforme d'exécution, nous pouvons réutiliser le même projet pour ajouter plus de fonctionnalités et/ou de modifier l'architecture cible. Par conséquent, cette approche aide les industries à atteindre leurs contraintes de time-to-market. Le code résultant, pour l'hôte et les unités de calcul, sont des fichiers source compilable qui satisfont aux spécifications définies dans la conception

HAL - Lille 3

INRIA a CCSD electronic archive server

Using ArrayOL to Identify Potentially Shareable Data in Thread Work-Groups of GPUs

Author: De Oliveira Rodrigues Antonio Wendell
Dekeyser Jean-Luc
Guyomarc'H Frédéric
Publication venue: HAL CCSD
Publication date: 14/03/2011
Field of study

International audienceOver recent years, using Graphics Processing Units (GPUs) has become as an effective method for increasing the performance of many applications. However, these performance beneﬁts from GPUs come at a price. First, extensive programming expertise and intimate knowledge of the underlying hardware are essential for gaining good speedups. Second, the expressibility of GPU-based programs are not powerful enough to retain the high-level abstractions of the solutions. Although the programming experience has been signiﬁcantly improved by existing frameworks like CUDA and OpenCL, it is still a challenge to effectively utilise these devices while still retaining the programming abstractions. To this end, performing a model-to-source transformation, whereby a highlevel language is mapped to CUDA or OpenCL, is an attractive option. In particular, it enables to harness the power of GPUs without any expertise on the GPGPU programming. In this work, we purpose an approach based on MDE and ArrayOL to detect shareable data zone. The tilers from ArrayOL, which allow express the data parallelism from repetitive tasks, are analyzed in time compilation to create areas of shared data. The identification of these areas is crucial to allow us loading data on shared areas of memory that have high throughput. Consequently, programs automatically generated shall have performances comparable to manually well written programs

HAL - Lille 3

INRIA a CCSD electronic archive server